Skip to content

Support tables with non-lowercase names in Druid#7197

Closed
dheerajkulakarni wants to merge 1 commit intotrinodb:masterfrom
miqdigital:trino-6850
Closed

Support tables with non-lowercase names in Druid#7197
dheerajkulakarni wants to merge 1 commit intotrinodb:masterfrom
miqdigital:trino-6850

Conversation

@dheerajkulakarni
Copy link
Copy Markdown
Member

This PR addresses #6850

Changes in this PR

  1. calling getRemoteTable & getRemoteSchema before filtering in getTableHandle(ConnectorSession session, SchemaTableName schemaTableName)
  2. Added test cases for the same "TestCaseInSensitiveMapping"
  3. Added a copyandIngestTpch function to copy any existing dataSource with a different dataSource name.

@cla-bot cla-bot Bot added the cla-signed label Mar 7, 2021
@dheerajkulakarni
Copy link
Copy Markdown
Member Author

dheerajkulakarni commented Mar 7, 2021

Hey @findepi @hashhar @Praveen2112, raised a draft PR as I need some inputs from you guys to go ahead,
this block of code which I highlighted in the image attached (io.trino.plugin.druid.BaseJdbcClient#toRemoteSchemaName)is failing DruidIntegrationSmokeTest cases, so what is happening is when case-insensitive-mapping is false which seems like a default behavior, this code of block is getting is executed, and I think by default storesUpperCaseIdentifiers() is giving true so SchemaName and TableName are getting returned in upper case letters because of which druid is not able to find the table or schema as they are originally in lowercase, before this remote function is not used to get called so there was no problem at all! so do you guys think this property has to be set explicitly somehow as false in test cases? or I observed TrinoMetaData class which has this property set as false by default, so isn't this supposed to return false by default?

Screenshot 2021-03-06 at 12 19 25 PM

@hashhar
Copy link
Copy Markdown
Member

hashhar commented Mar 7, 2021

I observed TrinoMetaData class which has this property set as false by default, so isn't this supposed to return false by default?

You are almost correct but looking in the wrong place. The JdbcMetadata we are fetching using connection.getMetaData() is from the Druid jdbc client (because that's the driver underlying the connection).
So the correct place to check is what/why does Druid's JDBC driver (https://github.com/apache/calcite-avatica) return stores uppercase as true when it differs from it's actual behaviour.

@hashhar hashhar self-requested a review March 7, 2021 14:03
@dheerajkulakarni
Copy link
Copy Markdown
Member Author

Hey @hashhar, I thought TrinoDataMetaData class is kind of base for all the JdbcMetaData classes in all the drivers and was referring to it, yeah maybe I was wrong completely. If we agree that it has to return false by default, Instead of trying to set this property in test cases I will try to find why is this returning true instead of false, and maybe if possible in any way I will try to correct it.

@hashhar
Copy link
Copy Markdown
Member

hashhar commented Mar 7, 2021

I think the driver is lying (because it uses a translation layer called Apache Calcite instead of being a true focused JDBC driver).

If we know the behaviour of Druid to be one of case preserving, upper case, lower case when storing then you can maybe add a copy of the toRemoteSchemaName and toRemoteTableName methods inside trino-druid for now and file an issue with the JDBC driver to fix JdbcMetadata for Druid.

This turned out to be a bit more complex than expected. Thanks for putting the effort into this @dheerajkulakarni.

@findepi findepi changed the title added support for querying tables with upper case character names Support tables with non-lowercase names in Druid Mar 8, 2021
@hashhar
Copy link
Copy Markdown
Member

hashhar commented Mar 8, 2021

Looks like the Avatica driver has some connection properties that affect the JdbcMetadata. See https://calcite.apache.org/docs/adapter.html#jdbc-connect-string-parameters.

The default behavior is inherited from Oracle (see lex property). You might need to change DruidJdbcClientModule to set proper connection properties before opening a connection from the factory (see MySqlClientModule for example).

The relevant ones to set are lex OR a combination of caseSensitive=TRUE, quoting=DOUBLE_QUOTE, quotedCasing=UNCHANGED and unquotedCasing=TO_LOWER.

@dheerajkulakarni
Copy link
Copy Markdown
Member Author

Thank you so much for the info @hashhar, I will go through this and will make the code changes accordingly!

@dheerajkulakarni dheerajkulakarni marked this pull request as ready for review March 10, 2021 16:06
@dheerajkulakarni
Copy link
Copy Markdown
Member Author

@hashhar as discussed as there is no way to set these properties at JDBC Driver level, overridden toRemoteSchema and toRemoteTable functions in DruidJdbcClient

@hashhar
Copy link
Copy Markdown
Member

hashhar commented Mar 10, 2021

Let's also create an issue with either Druid (or more correctly Apache Calcite since they build and ship the Avatica driver) to allow making the configuration properties be passed to Calcite so that JdbcMetadata isn't lying vs actual behaviour.

Thanks for the work, I will review this.

Copy link
Copy Markdown
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some changes requested.

In particular the test for testTableNameClash seems incorrect.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works until it breaks.
Can we read the indexTask as a JsonNode and replace the two nodes we are interested in and return the modified JsonNode back as a string?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like if the method name was more clear but can't think of anything better.

We are basically taking an existing index task, changing the destination name and ingesting under the new name. Something like a CREATE TABLE AS SELECT * FROM file.

import static java.util.Objects.requireNonNull;

final class RemoteTableNameCacheKey
public final class RemoteTableNameCacheKey
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create a GitHub issue here mentioning the reason why you did it this way and what needs to be fixed before we can do it the "correct" way.

Add a TODO comment here referring to that issue so that we can eventually clean up the code instead of changing API is non-needed ways.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same TODO over the now public constructor.

if (tableHandles.isEmpty()) {
return Optional.empty();
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: revert whitespace changes.

try (ResultSet resultSet = getTables(connection, Optional.of(remoteSchema), Optional.of(remoteTable))) {
List<JdbcTableHandle> tableHandles = new ArrayList<>();
while (resultSet.next()) {
tableHandles.add(new JdbcTableHandle(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tableHandles.add(new JdbcTableHandle(
tableHandles.add(new JdbcTableHandle(
schemaTableName,
getRemoteTable(resultSet));
private static RemoteTableName getRemoteTable(ResultSet resultSet)
{
return new RemoteTableName(
Optional.of(DRUID_CATALOG),
Optional.ofNullable(resultSet.getString("TABLE_SCHEM")),
resultSet.getString("TABLE_NAME"));
}

.row("shippriority", "bigint", "", "") // Druid doesn't support int type
.row("totalprice", "double", "", "")
.build();
MaterializedResult actualColumns = computeActual("DESCRIBE " + "CamelCase");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for string concat here.

.row("totalprice", "double", "", "")
.build();
MaterializedResult actualColumns = computeActual("DESCRIBE " + "CamelCase");
Assert.assertEquals(actualColumns, expectedColumns);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: static import.

copyAndIngestTpchData(getQueryRunner().execute(SELECT_FROM_REGION + " LIMIT 10"), this.druidServer, "region", "camelcase");
}
catch (AssertionError e) {
Assert.assertEquals(e.getMessage(), "Datasource camelcase not loaded expected [true] but found [false]");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share the entire stack trace?

This exception itself can also happen if something fails on Druid end rather than our side.

throws IOException, InterruptedException
{
try {
//ingesting data with already existing table name in lowercase which should fail
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect ingestion to work since Druid is case-sensitive. But querying such tables should fail from Trino.

MaterializedResult materializedRows = computeActual("SELECT * FROM druid.druid.CAMELCASE");
Assert.assertEquals(materializedRows.getRowCount(), 10);
MaterializedResult materializedRows1 = computeActual("SELECT * FROM druid.CamelCase");
MaterializedResult materializedRows2 = computeActual("SELECT * FROM druid.camelcase");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be simplified using assertQuery("SELECT COUNT(1) FROM druid.druid.camelcase", "VALUES 10")

@bitsondatadev
Copy link
Copy Markdown
Member

👋 @dheerajkulakarni - this PR is inactive and doesn't seem to be under development. If you'd like to continue work on this at any point in the future, feel free to re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants